QUESTION 1

  1. Go to CANVAS and download the data for PS1 Monthly.xlsx. The data was downloaded from the CRSP (The Center for Research in Security Prices) data base via WRDS (Wharton Research Service) that is available for LBS students. Note the file contains data about Microsoft, Exxon Mobil (previously Exxon), General Electric, JP Morgan Chase (previously Chemical Banking and Chase Manhattan), Intel, Citigroup (previously Primerica and Travellers Group). In addition, the columns vwretd (ewretd) and vwredx (ewretx) contain value-weighted (equal-weighted) total returns and total returns excluding dividends for the CRSP index that contains stocks from NYSE, AMEX, and NASDAQ. Finally, sprtrn contains the total return for the S&P 500 Composite Index.
#loading in PS1 monthly data
PS1_Monthly <- read_excel("PS1_Monthly.xlsx") %>% 
  mutate(DATE = lubridate::ymd(date)) %>% #fix `date`
  select(1,2,19,3:18) #reorder the columns logically

#addressing missingness in DIVAMT by imputing with 0
PS1_Monthly[is.na(PS1_Monthly)] <- 0

QUESTION 2

  1. Make sure you understand how the holding period returns (RET) are calculated given the (unadjusted) prices (PRC), dividends (DIVAMT) and adjustment for the number of shares (CFACPR, i.e., the adjusted price equals PRC/CFACPR). That is, replicate the returns using the raw data. You can find more information about the variables on the CRSP page online. Calculate returns where you omit the dividends, i.e., you focus solely on capital gains (call these returns RETX).
#creating a testing data frame
testing <- PS1_Monthly %>% 
  select(3,4,9,7,13,10,14)

The Calculation of Holding Period Returns (RET) is as follows:

\(RET_t \ = \ \frac{(PRC_t/CFACPR_t)+DIVAMT_t}{PRC_{t-1}/CFACPR_{t-1}}-1\)

where \(PRC_t/CFACPR_t\) is the share-adjusted stock price at time \(t\), and \(DIVAMT_t\) is the dividend payout at time t.

Creating a RET Calculator

#creating a function to calculate RET with the arguments: (I) price (PRC), (II) dividends (DIVAMT) and number of shares adjustment (CFACPR)
RET_calculator <- function(PRC, DIVAMT, CFACPR) {
  
  #copying the price column of the data frame
  RET = PRC
  
  #setting first period returns to zero
  RET[1] = 0 
  
  #looping over the rows to calculate and assign values to RET 
  for (n in 2:length(PRC)) {
    
    #calculating RET using the stated formula
    RET[n] = round(((PRC[n]/CFACPR[n] + DIVAMT[n])/(PRC[n-1]/CFACPR[n-1]) - 1),6)
  }
  
  #outputting the RET vector
  return(RET)
}

*Testing the RET Calculator**

#testing the RET calculator for all the stocks in the data frame
testing<-testing %>% 
  group_by(TICKER) %>% 
  mutate(RET_rep = RET_calculator(PRC, DIVAMT, CFACPR))

#evaluating whether the RET calculation has been accurate
summary(testing$RET==testing$RET_rep)
##    Mode   FALSE    TRUE 
## logical     298    1519
#consequently, we find that in the vast majority of cases (1519) this calculation is successful - though in 298 instances the calculated RET_rep is not identical, suggesting our formulation is imperfect

The Calculation of Holding Period Returns, Omitting Dividends (RETX) is as follows:

\(RETX_t \ = \ \frac{PRC_t/CFACPR_t}{PRC_{t-1}/CFACPR_{t-1}}-1\)

where \(PRC_t/CFACPR_t\) is the share-adjusted stock price at time \(t\).

We now repeat the same calculation process, omitting DIVAMT from the equation.

Creating a RETX Calculator

#creating a function to calculate RETX with the arguments: (I) price (PRC), and (II) number of shares adjustment (CFACPR)
RETX_calculator <- function(PRC, CFACPR) {
  
  #copying the price column of the data frame
  RETX = PRC
  
  #setting first period returns to zero
  RETX[1] = 0 
  
  #looping over the rows to calculate and assign values to RETX 
  for (n in 2:length(PRC)) {
    
    #calculating RETX using the stated formula
    RETX[n] = round(((PRC[n]/CFACPR[n])/(PRC[n-1]/CFACPR[n-1]) - 1),6)
  }
  
  #outputting the RETX vector
  return(RETX)
}

Testing the RETX Calculator

#testing the RET calculator for all the stocks in the data frame
testing<-testing %>% 
  group_by(TICKER) %>% 
  mutate(RETX_rep = RETX_calculator(PRC, CFACPR))

#evaluating whether the RET calculation has been accurate
summary(testing$RETX==testing$RETX_rep)
##    Mode   FALSE    TRUE 
## logical      62    1755
#consequently, we find that in the vast majority of cases (1755) this calculation is successful - though in 62 instances the calculated RETX_rep is not identical, suggesting our formulation is imperfect

QUESTION 3

  1. Use the holding period returns to create a total return index for the MSFT and GE stocks and the S&P 500 index, which shows the theoretical growth in value of an investment in the stock assuming that dividends are reinvested (normalize the start value to 1). Do the same for the returns that abstract from dividend payments (i.e., use RETX instead). Plot the investments with and without dividends for each stock separately. How do dividends affect the results stock by stock?

The total return index for a given stock is: \(Price_t = Price_{t-1}(1+Return_t)\)

Applying a Cumulative Product Approach to Calculate Total Return Indexes

#creating the total return index for MSFT
MSFT_monthly <-PS1_Monthly %>% 
  group_by(TICKER) %>% 
  filter(TICKER=="MSFT") %>% 
  mutate(
    
    TR_RET = cumprod(1*(1+RET)),
    TR_RETX = cumprod(1*(1+RETX)),
    TR_SP500 = cumprod(1*(1+sprtrn)),
        
         )

#using the TR function to create the total return index for GE
GE_monthly <-PS1_Monthly %>% 
  group_by(TICKER) %>% 
  filter(TICKER=="GE") %>% 
  mutate(
    
    TR_RET = cumprod(1*(1+RET)),
    TR_RETX = cumprod(1*(1+RETX)),
    TR_SP500 = cumprod(1*(1+sprtrn)),
        
         )

#note that due to duplicated dates, using a cumulative product approach (or a function approach) on sprtrn (to calculate the S&P500 total return) will not produce an accurate total return index

Plotting the Index for MSFT

#for MSFT both with and without dividends
ggplot()+
  geom_line(data=MSFT_monthly,aes(x=DATE,y=TR_RET),color="red")+
  geom_line(data=MSFT_monthly,aes(x=DATE,y=TR_RETX),color="blue")+
  geom_line(data=MSFT_monthly,aes(x=DATE,y=TR_SP500),color="gray45")+
  theme_fivethirtyeight() + theme(axis.title=element_text())+
  labs(title="MSFT | Microsoft Corporation", subtitle="Monthly Total Return Index <span style='color:red;'>Including Dividends</span>, <span style='color:blue;'>Excluding Dividends</span> and for the <span style='color:gray45;'>S&P500</span></span>",x="Date",y="Total Return")+
  scale_y_continuous(labels=scales::dollar_format(), breaks=seq(0,120,by=10))+
  scale_x_date(date_breaks="2 years",date_labels="%Y")+
  theme(plot.subtitle = element_markdown())

Plotting the Index for GE

#for GE both with and without dividends
ggplot()+
  geom_line(data=GE_monthly,aes(x=DATE,y=TR_RET),color="red")+
  geom_line(data=GE_monthly,aes(x=DATE,y=TR_RETX),color="blue")+
  geom_line(data=GE_monthly,aes(x=DATE,y=TR_SP500),color="gray45")+
  theme_fivethirtyeight() + theme(axis.title=element_text())+
  labs(title="GE | General Electric Company", subtitle="Monthly Total Return Index <span style='color:red;'>Including Dividends</span>, <span style='color:blue;'>Excluding Dividends</span> and for the <span style='color:gray45;'>S&P500</span></span>",x="Date",y="Total Return")+
  scale_y_continuous(labels=scales::dollar_format(), breaks=seq(0,16,by=2), limits=c(0,16))+
  scale_x_date(date_breaks="2 years",date_labels="%Y")+
  theme(plot.subtitle = element_markdown())

These two plots clearly demonstrate that including dividends results in a substantially higher total return over time for both stocks surveyed.

Notably (as we would expect), the divergence of total returns including dividends from total returns excluding dividends increases consistently over the period as the returns accruing from dividends compound. More specifically, for MSFT, the inclusion of dividend payments results in a total return of ~$115 over the entire period from an initial $1 investment, over 1.5x greater than the return excluding dividend payments, of just ~$74. Similarly, for GE, the full-period return including dividends is ~$10 is 2x greater than that excluding dividend payments, of just $4.9.

Thus, arguably dividends have a greater effect on total returns across the full period for GE stock than MSFT stock, despite MSFT having considerably larger total returns overall. Again, this is to be expected, since MSFT only began to issue dividends in January 2003, while GE issued dividends throughout the period surveyed, allowing total returns accruing from dividends to compound over a considerably longer time period.

QUESTION 4

  1. Normalize the price of GE using the adjusted number of shares. Plot the adjusted price against the unadjusted prices. Discuss.
#normalizing the price by adjusting for number of shares
GE_monthly <- GE_monthly %>% 
  mutate(PRC_norm = PRC/CFACPR)

#plotting the adjusted prices against the unadjusted prices
ggplot()+
  geom_line(data=GE_monthly,aes(x=DATE,y=PRC),color="red")+
  geom_line(data=GE_monthly,aes(x=DATE,y=PRC_norm),color="blue")+
  theme_fivethirtyeight() + theme(axis.title=element_text())+
  labs(title="GE | General Electric Company", subtitle="<span style='color:red;'>Nominal Price</span> and <span style='color:blue;'>Share-Adjusted Price</span> </span>",x="Date",y="Price")+
  scale_x_date(date_breaks="2 years",date_labels="%Y")+
  scale_y_continuous(labels=scales::dollar_format(), breaks=seq(0,160,by=20), limits=c(0,160))+
  theme(plot.subtitle = element_markdown())

Reviewing the relationship between GE’s nominal and adjusted share prices, it is clear by the steep and immediate drop in nominal share price, to the exact level of the adjusted share price, that there was a stock split. Indeed, this drop from approximately $160 per share, to approximately $55 per share, is consistent with GE’s 3 for 1 stock split in mid-2000, as discussed in GE’s 2000 Annual Report.

Alongside the dividend comparison plots above, this demonstrates dividend payments and price adjustment (not just nominal prices) are critical components of assets which must be factored into returns.

QUESTION 5

  1. The holding period returns are normal returns. Generate a new variable that contains the corresponding log returns (LRET). Calculate the mean, variance, skewness, and kurtosis of the normal and the log returns. Plot the normal against the log returns for MSFT. Briefly discuss your results!

Log returns are calculated as follows:

\(LRET = ln(PRC_t/PRC_{t-1}) = ln(1+RET)\),

where \(PRC_t\) is the asset price at time \(t\) and \(RET=(PRC_t/PRC_{t-1}-1)\)

#creating the log returns variable
PS1_Monthly <- PS1_Monthly %>% 
    mutate(LRET=log(1+RET)) 

#calculating the mean, variance, skewness, and kurtosis of the normal (RET) and the log (LRET) returns
statistics_monthly <- PS1_Monthly %>% 
    group_by(TICKER) %>% 
  
#using the library(PerformanceAnalytics) to calculate the statistics
    summarise(
      
      mean_normal = mean(RET),
      variance_normal = var(RET),
      skewness_normal = skewness(RET),
      kurtosis_normal = kurtosis(RET),
      
      mean_log = mean(LRET),
      variance_log = var(LRET),
      skewness_log = skewness(LRET), 
      kurtosis_log = kurtosis(LRET)
      
      )
## `summarise()` ungrouping output (override with `.groups` argument)
print(statistics_monthly)
## # A tibble: 12 x 9
##    TICKER mean_normal variance_normal skewness_normal kurtosis_normal mean_log
##    <chr>        <dbl>           <dbl>           <dbl>           <dbl>    <dbl>
##  1 C          0.00300         0.0161         3.12e- 1           8.21  -0.00553
##  2 CCI        0.164           0.0179        -2.78e-16          -2      0.148  
##  3 CHL        0.0204          0.0105        -1.64e- 1           2.20   0.0151 
##  4 CMB        0.0192          0.0120        -2.07e- 1           0.977  0.0131 
##  5 GE         0.0103          0.00521       -1.84e- 1           1.52   0.00763
##  6 INTC       0.0197          0.0118        -2.35e- 1           1.11   0.0135 
##  7 JPM        0.00857         0.00823       -1.87e- 1           1.14   0.00440
##  8 MSFT       0.0200          0.00866        4.44e- 1           2.04   0.0157 
##  9 PA         0.0281          0.0106        -6.86e- 2           0.347  0.0227 
## 10 TRV        0.0245          0.00866       -1.06e+ 0           2.48   0.0197 
## 11 XOM        0.00785         0.00251        3.00e- 1           1.68   0.00659
## 12 XON        0.0142          0.00170        5.21e- 1           0.740  0.0133 
## # … with 3 more variables: variance_log <dbl>, skewness_log <dbl>,
## #   kurtosis_log <dbl>

These distributions demonstrate how the logged returns of key assets such as MSFT and GE have a lower kurtosis than the normal returns, and less left-skewed. This is to be expected since taking the log of returns adjusts for the effect of a substantial number of positive outliers.

Aside from kurtosis and skew (which is represented by the logged mean return being consistently smaller than the normal mean return), the variance of the normal returns and logged returns are similar: the logged return variance being only marginally smaller than the normal return variance, again due to mitgation of the effects of outliers at the positive extreme.

#(further) adding the total returns LRET series for MSFT
MSFT_monthly <-MSFT_monthly %>% 
  mutate(LRET=log(1+RET))

#plotting normal and log returns for MSFT
ggplot()+
  geom_line(data=MSFT_monthly,aes(x=DATE,y=RET),color="red")+
  geom_line(data=MSFT_monthly,aes(x=DATE,y=LRET),color="blue")+
  theme_fivethirtyeight() + theme(axis.title=element_text())+
  labs(title="MSFT | Microsoft Corporation", subtitle="<span style='color:red;'>Normal Returns</span> and <span style='color:blue;'>Log Returns</span> </span>",x="Date",y="Return")+
  scale_x_date(date_breaks="2 years",date_labels="%Y")+
  scale_y_continuous(labels=scales::percent_format(), breaks=seq(-0.4,0.4,by=0.05), limits=c(-0.4,0.4))+
  theme(plot.subtitle = element_markdown())

Conclusions on Logged vs. Normal Returns

This plot demonstrates how logged returns closely approximate normal returns. However, consistent with the statistical distributions themselves, normal return peaks are marginally higher than those for logged returns, which adjust for outliers. This means that, statistically, logged returns have marginally lower: mean and variance, less positive (more negative) left skewness and lower kurtosis, than normal returns.

Ultimately the log formulation is powerful because it allows us to add asset returns in order to estimate total returns over time, greatly simplifying returns analysis.

QUESTION 6

  1. Go to CANVAS and download the data for PS1 Daily.xlsx. This file contains two worksheets. HPR Daily contains the daily holding period returns for the six stocks, the S&P 500 Composite Index and the value-weighted market portfolio (including dividends) from CRSP. Prices Daily contains the prices for the six stocks and the S&P 500 Composite Index.
HPR_daily <- read_xlsx("PS1_Daily.xlsx", sheet = "HPR_daily", skip = 1) %>% 
    mutate(DATE = lubridate::ymd(DATE))  # fix `date`

Prices_daily <- read_xlsx("PS1_Daily.xlsx", sheet = "Prices_daily", skip = 1) %>% 
    mutate(DATE = lubridate::ymd(DATE))  # fix `date`
#pivoting prices daily
Prices_daily_long <-Prices_daily %>% 
  pivot_longer(cols = c(2:8),names_to = "TICKER",values_to = "PRC") 

#pivoting holding period returns
HPR_daily_long <-HPR_daily %>% 
  pivot_longer(cols = c(2:9),names_to = "TICKER",values_to = "RET")

QUESTION 7

  1. Construct a daily total return index for MSFT and GE stocks and the S&P 500 index and plot them against each other. Compare your results with the monthly total return indices from above. Are there any differences? Discuss.
#creating MSFT daily total return index
MSFT_daily <-HPR_daily_long %>% 
  group_by(TICKER) %>% 
  filter(TICKER=="MSFT") %>% 
  mutate(TR_RET = cumprod(1*(1+RET)))

#creating GE daily total return index
GE_daily <-HPR_daily_long %>% 
  group_by(TICKER) %>% 
  filter(TICKER=="GE") %>% 
  mutate(TR_RET = cumprod(1*(1+RET)))
    
#creating SP500 daily total return index (this now works, because pivoting longer has removed duplicate date entries)
SP500_daily <-HPR_daily_long %>% 
  group_by(TICKER) %>% 
  filter(TICKER=="SPRTRN") %>% 
  mutate(TR_RET = cumprod(1*(1+RET)))

Monthly Total Return Indexes

Plotting the Monthly Total Return Index for MSFT

#(monthly) plotting MSFT in comparison to the SP500
monthly_returns_MSFT <- ggplot()+
  geom_line(data=MSFT_monthly,aes(x=DATE,y=TR_RET),color="orange")+
  geom_line(data=MSFT_monthly,aes(x=DATE,y=TR_SP500),color="gray45")+
  theme_fivethirtyeight() + theme(axis.title=element_text())+
  labs(title="MSFT | Monthly Total Returns Comparison", subtitle="Monthly Total Returns Index for <span style='color:orange;'>MSFT</span> and the <span style='color:gray45;'>S&P500</span> </span>",x="Date",y="Total Return")+
  scale_x_date(date_breaks="2 years",date_labels="%Y")+
  scale_y_continuous(labels=scales::dollar_format(), breaks=seq(0,120,by=10), limits=c(0,120))+
  theme(plot.subtitle = element_markdown())

Plotting the Monthly Total Return Index for GE

#(monthly) plotting GE in comparison to the SP500
monthly_returns_GE <- ggplot()+
  geom_line(data=GE_monthly,aes(x=DATE,y=TR_RET),color="purple")+
  geom_line(data=MSFT_monthly,aes(x=DATE,y=TR_SP500),color="gray45")+
  theme_fivethirtyeight() + theme(axis.title=element_text())+
  labs(title="GE | Monthly Total Returns Comparison", subtitle="Monthly Total Returns Index for <span style='color:purple;'>GE</span> and the <span style='color:gray45;'>S&P500</span> </span>",x="Date",y="Total Return")+
  scale_x_date(date_breaks="2 years",date_labels="%Y")+
  scale_y_continuous(labels=scales::dollar_format(), breaks=seq(0,15,by=1), limits=c(0,15))+
  theme(plot.subtitle = element_markdown())

Daily Total Return Indexes

Plotting the Index for MSFT

#(daily) plotting MSFT and GE in comparison to each other and SP500
daily_returns_MSFT <- ggplot()+
  geom_line(data=MSFT_daily,aes(x=DATE,y=TR_RET),color="orange")+
  geom_line(data=SP500_daily,aes(x=DATE,y=TR_RET),color="gray45")+
  theme_fivethirtyeight() + theme(axis.title=element_text())+
  labs(title="MSFT | Daily Total Returns Comparison", subtitle="Daily Total Returns Index for <span style='color:orange;'>MSFT</span> and the <span style='color:gray45;'>S&P500</span> </span>",x="Date",y="Total Return")+
  scale_x_date(date_breaks="2 years",date_labels="%Y")+
  scale_y_continuous(labels=scales::dollar_format(), breaks=seq(0,120,by=10), limits=c(0,120))+
  theme(plot.subtitle = element_markdown())

Plotting the Index for GE

#(daily) plotting MSFT and GE in comparison to each other and SP500
daily_returns_GE <- ggplot()+
  geom_line(data=GE_daily,aes(x=DATE,y=TR_RET),color="purple")+
  geom_line(data=SP500_daily,aes(x=DATE,y=TR_RET),color="gray45")+
  theme_fivethirtyeight() + theme(axis.title=element_text())+
  labs(title="GE | Daily Total Returns Comparison", subtitle="Daily Total Returns Index for <span style='color:purple;'>GE</span> and the <span style='color:gray45;'>S&P500</span> </span>",x="Date",y="Total Return")+
  scale_x_date(date_breaks="2 years",date_labels="%Y")+
  scale_y_continuous(labels=scales::dollar_format(), breaks=seq(0,15,by=1), limits=c(0,15))+
  theme(plot.subtitle = element_markdown())
#plotting the two alongside one another
(daily_returns_MSFT+monthly_returns_MSFT)/(daily_returns_GE+monthly_returns_GE)

Comparing the daily and monthly total return indexes shown above, it is clear that each asset demonstrates identical trends over the course of the period, though (as we would expect) the daily returns index exhibits considerably greater price volatility, which is smoothed by monthly returns calculations.

QUESTION 8

  1. As before, the holding period returns are normal returns. Create log returns. Calculate the mean, variance, skewness, and kurtosis of the normal and log returns. Compare and discuss your results with the results from monthly frequency.
#creating the log returns variable, LRET = ln(1+RET) = ln(PRC/lag(PRC))
HPR_daily_long <-HPR_daily_long %>% 
  mutate(LRET=log(1+RET))

#calculating the mean, variance, skewness, and kurtosis of the normal (RET) and the log (LRET) returns
statistics_daily <- HPR_daily_long %>% 
    group_by(TICKER) %>% 
  
#using the library(PerformanceAnalytics) to calculate the statistics
    summarise(
      
      mean_normal = mean(RET),
      variance_normal = var(RET),
      skewness_normal = skewness(RET),
      kurtosis_normal = kurtosis(RET),
      
      mean_log = mean(LRET),
      variance_log = var(LRET),
      skewness_log = skewness(LRET), 
      kurtosis_log = kurtosis(LRET)
      
      )
## `summarise()` ungrouping output (override with `.groups` argument)
#comparing daily and monthly normal and logged statistics
print(statistics_daily)
## # A tibble: 8 x 9
##   TICKER mean_normal variance_normal skewness_normal kurtosis_normal mean_log
##   <chr>        <dbl>           <dbl>           <dbl>           <dbl>    <dbl>
## 1 C         0.000648        0.000887          1.34             45.0  0.000209
## 2 GE        0.000517        0.000323          0.286             8.89 0.000356
## 3 INTC      0.000918        0.000617         -0.0923            5.22 0.000609
## 4 JPM       0.000737        0.000631          0.716            11.5  0.000425
## 5 MSFT      0.000958        0.000433          0.250             5.80 0.000742
## 6 SPRTRN    0.000345        0.000130         -0.0563            8.87 0.000280
## 7 VWRETD    0.000428        0.000126         -0.163             8.72 0.000365
## 8 XOM       0.000545        0.000221          0.308             9.63 0.000435
## # … with 3 more variables: variance_log <dbl>, skewness_log <dbl>,
## #   kurtosis_log <dbl>
print(statistics_monthly)
## # A tibble: 12 x 9
##    TICKER mean_normal variance_normal skewness_normal kurtosis_normal mean_log
##    <chr>        <dbl>           <dbl>           <dbl>           <dbl>    <dbl>
##  1 C          0.00300         0.0161         3.12e- 1           8.21  -0.00553
##  2 CCI        0.164           0.0179        -2.78e-16          -2      0.148  
##  3 CHL        0.0204          0.0105        -1.64e- 1           2.20   0.0151 
##  4 CMB        0.0192          0.0120        -2.07e- 1           0.977  0.0131 
##  5 GE         0.0103          0.00521       -1.84e- 1           1.52   0.00763
##  6 INTC       0.0197          0.0118        -2.35e- 1           1.11   0.0135 
##  7 JPM        0.00857         0.00823       -1.87e- 1           1.14   0.00440
##  8 MSFT       0.0200          0.00866        4.44e- 1           2.04   0.0157 
##  9 PA         0.0281          0.0106        -6.86e- 2           0.347  0.0227 
## 10 TRV        0.0245          0.00866       -1.06e+ 0           2.48   0.0197 
## 11 XOM        0.00785         0.00251        3.00e- 1           1.68   0.00659
## 12 XON        0.0142          0.00170        5.21e- 1           0.740  0.0133 
## # … with 3 more variables: variance_log <dbl>, skewness_log <dbl>,
## #   kurtosis_log <dbl>

Conclusions on Logged vs. Normal Returns for DAILY vs. MONTHLY Returns

As for Monthly returns, we find that logged daily returns demonstrate marginally lower: mean and variance, less positive (more negative) left skewness and lower kurtosis, than normal returns.

As would be expected, relative to monthly returns, normal daily returns have a considerably lower mean and variance, are much more strongly positively (left) skewed, and have considerably higher kurtosis.

Evidently, calculation of (normal) returns on a monthly basis not only smooths out price volatility (variance), but results in a distribution which more closely approximates the normal distribution (no leptokurtosis or ‘fat tails’, no substantive positive or negative skew).

QUESTION 9

  1. Compare the statistical properties of the log holding period return time series both for monthly and daily returns. Plot a histogram and discuss how the empirical distributions relate to the normal distribution.

Logged Monthly Distribution

distribution_log_monthly <- PS1_Monthly %>% 
  ggplot(aes(x = LRET))+
  geom_histogram(binwidth = 0.01, fill="pink")+
  facet_wrap(~TICKER)+
  theme_fivethirtyeight()+
  theme(axis.title=element_text())+
  labs(title="Monthly Log Return Distribution",subtitle="Comparing the Distribution of Monthly and Daily Log Returns",x="Log Return",y="Frequency")+
  scale_x_continuous(breaks=seq(-0.5,0.5,0.1),limits=c(-0.5,0.5))

Logged Daily Distribution

distribution_log_daily <- HPR_daily_long %>% 
  ggplot(aes(x = LRET))+
  geom_histogram(binwidth = 0.01, fill="pink")+
  facet_wrap(~TICKER)+
  theme_fivethirtyeight()+
  theme(axis.title=element_text())+
  labs(title="Daily Log Return Distribution",subtitle="Comparing the Distribution of Monthly and Daily Log Returns",x="Log Return",y="Frequency")+
  scale_x_continuous(breaks=seq(-0.5,0.5,0.1),limits=c(-0.5,0.5))

Comparing the Returns Distributions

distribution_log_monthly+distribution_log_daily

For a better comparison, we compare the monthly and daily logged returns of MSFT::

MSFT | Log Returns Comparison

#specifying GE monthly log distribution
MSFT_log_dist_monthly <- PS1_Monthly %>% 
  filter(TICKER=="MSFT") %>% 
  ggplot(aes(x = LRET))+
  geom_histogram(binwidth = 0.01, fill="orange")+
  theme_fivethirtyeight()+
  theme(axis.title=element_text())+
  labs(title="Monthly Log Return Distribution",subtitle="MSFT | Microsoft Corporation",x="Log Return",y="Frequency")+
  scale_x_continuous(breaks=seq(-0.5,0.5,0.1),limits=c(-0.5,0.5))

#specifying GE daily log distribution
MSFT_log_dist_daily <- HPR_daily_long %>% 
  filter(TICKER=="MSFT") %>% 
  ggplot(aes(x = LRET))+
  geom_histogram(binwidth = 0.01, fill="orange")+
  facet_wrap(~TICKER)+
  theme_fivethirtyeight()+
  theme(axis.title=element_text())+
  labs(title="Daily Log Return Distribution",subtitle="MSFT | Microsoft Corporation",x="Log Return",y="Frequency")+
  scale_x_continuous(breaks=seq(-0.5,0.5,0.1),limits=c(-0.5,0.5))

#outputting the two distributions
MSFT_log_dist_monthly/MSFT_log_dist_daily
## Warning: Removed 2 rows containing missing values (geom_bar).

## Warning: Removed 2 rows containing missing values (geom_bar).

Though the daily returns distributions above appear to be more normally distributed than those for monthly returns, due to their regular narrow “bell” shape, since there are comparatively many observations at each extreme, the daily log returns distribution is too leptokurtic (kurtosis is too high) to be normally distributed. Thus, as concluded in Question 8, monthly returns (whether logged or not) better approximate the normal distribution than daily returns. This implies that the normal distribution assumption is more likely to hold for lower frequency returns, such as monthly, rather than daily, returns.

QUESTION 10

  1. Pick three stocks and the S&P 500 index (either you can use MSFT, GE and JPM or adapt the code to pick three random stocks). You will need the holding period returns (both normal and log returns) and the total return indices you created.

We prepare the stocks data including: (I) holding period returns (RET); (II) log holding period returns (LRET); (III) total returns (TR_RET); (IV) total log returns (TR_LRET); (V) total returns from exp(log) (TR_RETEXP); and (VI) squared log returns (SQ_LRET).

#we select MSFT, GE and JPM and create a dataframe including HPR, LRET and TR_RET (total return), calculated by summing logged returns
stocks <- HPR_daily_long %>% 
  filter(TICKER == "GE"|
         TICKER == "JPM"|
         TICKER == "MSFT"|
         TICKER == "SPRTRN") %>% 
  group_by(TICKER) %>% 
  mutate(
    
    #log total return
    TR_LRET = cumsum(LRET),
    #normal total return
    TR_RET = cumprod(1*(1+RET)),
    #normal total return calculated from exp(log) - used to double-check that TR_RET is accurate (which it is)
    TR_RETEXP = exp(TR_LRET),
    #squared log returns
    SQ_LRET = log(1+RET)^2
      
    ) %>% 
  select(1,2,3,6,4,5,7,8)

QUESTION 11

  1. Calculate the covariance matrix for the log return series, using both the returns and returns squared. Discuss your results briefly.

Log Returns Covariance Matrix

stocks %>% 
  select(1,2,5) %>% 
  pivot_wider(names_from="TICKER",values_from="LRET") %>% 
  select(-1) %>% 
  cov() %>% 
  round(6)
##            MSFT       GE      JPM   SPRTRN
## MSFT   0.000431 0.000155 0.000202 0.000146
## GE     0.000155 0.000323 0.000253 0.000154
## JPM    0.000202 0.000253 0.000623 0.000198
## SPRTRN 0.000146 0.000154 0.000198 0.000130

Squared Log Returns Covariance Matrix

stocks %>% 
  select(1,2,8) %>% 
  pivot_wider(names_from="TICKER",values_from="SQ_LRET") %>% 
  select(-1) %>% 
  cov() %>% 
  round(8)
##            MSFT       GE      JPM  SPRTRN
## MSFT   1.41e-06 3.00e-07 5.90e-07 2.1e-07
## GE     3.00e-07 1.06e-06 1.15e-06 2.5e-07
## JPM    5.90e-07 1.15e-06 4.90e-06 5.1e-07
## SPRTRN 2.10e-07 2.50e-07 5.10e-07 1.8e-07

The two covariance matrices demonstrate that positive covariance exists between each pair of assets (the three chosen stocks, or the S&P 500) in terms of both log returns and log returns squared, indicating a consistently positive relationship. This means that all 3 selected stocks are likely to move in the same direction as eachother, and as the S&P500, when responding to market conditions. Notably, the covariance of the log return squared matrix is over 1/100 smaller than the corresponding log return matrix, thereby indicating a weaker positive relationship between the returns of each asset.

Within the 3 stocks picked, JPM has the strongest relationship with S&P 500, and JPM and GE are the most strongly correlated of the stocks, in terms of their logged returns. This implies that, in order to optimally diversify away idiosyncratic risk, a portfolio should not include JPM if it is to include MSFT, GE or S&P500 - and that, of the chosen stocks, MSFT is least correlated in its returns with both the other stocks and thus likely to be the best addition to a diversified portfolio.

However, we cannot rely fully on these covariances to draw inferences regarding the optimal portfolio. This is because the magnitude of covariance can be heavily skewed whenever the data set includes too many significantly different values: single outliers (which remain present in the logged returns, though to a lesser extent than normal returns) can cause dramatic under or overstatement of the relationship between asset returns.

QUESTION 12

  1. Plot the ACF (autocorrelation function) for prices, returns, returns squared, and absolute returns. Discuss the results!

Monthly (not daily) asset data is used throughout this question, and the ACF is plotted for each of the 3 chosen stocks (MSFT, GE, JPM)

For MSFT

#preparing the data for ACF
MSFT_monthly_ACF <- PS1_Monthly %>% 
  filter(TICKER=="MSFT") %>% 
  #date, price, return
  select(3,9,10) %>% 
  mutate(
    
    #returns squared
    RET_SQ = RET^2,
    #absolute returns
    RET_ABS = abs(RET)
    
  )

#I employ a maximum lag covering the full period (25 years = 300 months)

#plotting the ACF for prices
price_MSFT <- acf(MSFT_monthly_ACF$PRC,lag.max=300,plot=FALSE)
plot(price_MSFT, main = "ACF of MSFT Monthly Prices") 

#plotting the ACF for returns
return_MSFT <- acf(MSFT_monthly_ACF$RET,lag.max=300,plot=FALSE)
plot(return_MSFT, main = "ACF of MSFT Monthly Returns") 

#plotting the ACF for returns squared
returnsq_MSFT <- acf(MSFT_monthly_ACF$RET_SQ,lag.max=300,plot=FALSE)
plot(returnsq_MSFT, main = "ACF of MSFT Monthly Returns Squared") 

#plotting the ACF for absolute returns
absolute_MSFT <- acf(MSFT_monthly_ACF$RET_ABS,lag.max=300,plot=FALSE)
plot(absolute_MSFT, main = "ACF of MSFT Monthly Absolute Returns") 

For GE

#preparing the data for ACF
GE_monthly_ACF <- PS1_Monthly %>% 
  filter(TICKER=="GE") %>% 
  #date, price, return
  select(3,9,10) %>% 
  mutate(
    
    #returns squared
    RET_SQ = RET^2,
    #absolute returns
    RET_ABS = abs(RET)
    
  )

#I employ a maximum lag covering the full period (25 years = 300 months)

#plotting the ACF for prices
price_GE <- acf(GE_monthly_ACF$PRC,lag.max=300,plot=FALSE)
plot(price_GE, main = "ACF of GE Monthly Prices") 

#plotting the ACF for returns
return_GE <- acf(GE_monthly_ACF$RET,lag.max=300,plot=FALSE)
plot(return_GE, main = "ACF of GE Monthly Returns") 

#plotting the ACF for returns squared
returnsq_GE <- acf(GE_monthly_ACF$RET_SQ,lag.max=300,plot=FALSE)
plot(returnsq_GE, main = "ACF of GE Monthly Returns Squared") 

#plotting the ACF for absolute returns
absolute_GE <- acf(GE_monthly_ACF$RET_ABS,lag.max=300,plot=FALSE)
plot(absolute_GE, main = "ACF of GE Monthly Absolute Returns") 

For JPM

#preparing the data for ACF
JPM_monthly_ACF <- PS1_Monthly %>% 
  filter(TICKER=="JPM") %>% 
  #date, price, return
  select(3,9,10) %>% 
  mutate(
    
    #returns squared
    RET_SQ = RET^2,
    #absolute returns
    RET_ABS = abs(RET)
    
  )

#I employ a maximum lag covering the full period (25 years = 300 months)

#plotting the ACF for prices
price_JPM <- acf(JPM_monthly_ACF$PRC,lag.max=300,plot=FALSE)
plot(price_JPM, main = "ACF of JPM Monthly Prices") 

#plotting the ACF for returns
return_JPM <- acf(JPM_monthly_ACF$RET,lag.max=300,plot=FALSE)
plot(return_JPM, main = "ACF of JPM Monthly Returns") 

#plotting the ACF for returns squared
returnsq_JPM <- acf(JPM_monthly_ACF$RET_SQ,lag.max=300,plot=FALSE)
plot(returnsq_JPM, main = "ACF of JPM Monthly Returns Squared") 

#plotting the ACF for absolute returns
absolute_JPM <- acf(JPM_monthly_ACF$RET_ABS,lag.max=300,plot=FALSE)
plot(absolute_JPM, main = "ACF of JPM Monthly Absolute Returns") 

Autocorrelation indicates the degree of correlation of the same variables between two successive time intervals. The plots above have lag numbers on the x-axis and degree of ACF on the y-axis, demonstrating different levels of autocorrelation between time series data when the lag is set to different values (with a maximum value of 300 months).

ACF for Price Evidently, MSFT and GE stocks demonstrate a similar price ACF, by which, the longer the lagged period, the lower the autocorrelation - until the autocorrelation becomes negative (due to the boom and bust cyles of markets). Subsequently, as the period lag increases further, the price becomes totally uncorrelated with the initial time interval, which is to be expected as such large time lags result in the accumulation of unpredictable and uncorrelated price influences. However, JPM shows a different, more cyclical trend (most likely because its share price performance is more closely correlated with core market drivers), by which the share price cycles between positive and negative autocorrelation as the lagged period increases, with almost no subsequent periods being entirely uncorrelated with the price at the initial time interval.

ACF for Returns (Normal, Squared, Absolute) Interestingly, the ACF functions demonstrate how while normal returns do not appear to follow a cyclical pattern, both squared and absolute returns demonstrate considerable cyclicality for all 3 stocks surveyed. For these returns transformations, we find that while autocorrelation falls steeply as the lag increases, after 10-20 lagged periods, there is considerably more regularity and cyclicality in the ACF.

This indicates that, while normal returns do not follow a particularly regular pattern as the lagged period increases, squared and absolute returns (on the basis of greater cyclicality alternating between positive and negative autocorrelation as the lag increases) demonstrate a far more predictable sequence. Consequently, it appears that squared and absolute returns are far more predictable (especially for consistently periodic stocks like GE) than normal returns or price.

QUESTION 13

  1. Use the three assets and make up a portfolio by assigning arbitrary portfolio weights. What does it imply if you keep the weights fixed over time?

Since the portfolio weight of a given asset \(\mathbb{X}\) is represented by the value of the asset as a proportion of the value of the entire portfolio, and the asset value fluctuates constantly, by keeping the portfolio weights fixed, we must constantly re-balance our portfolio.

MSFT_weight = 0.4
GE_weight = 0.3
JPM_weight = 0.3

QUESTION 14

  1. Calculate the portfolio returns and use them to calculate the evolution of a $1 investment in the portfolio over the sample period. Plot the result against the evolution of a $1 investment in each of the three stocks. Discuss the result.

Since ‘stocks’ contains all the required data, we must now manipulate to extract the required total returns.

#preparing the total returns by stock in wide format
portfolio_stocks <- stocks %>% 
  select(1,2,4) %>% 
  pivot_wider(names_from="TICKER",values_from="TR_RET") %>% 
  select(1:4)

#creating a new column containing the weighted sum of these total individual stock returns
PTFOLIO <- portfolio_stocks %>% 
  mutate(
    
    Portfolio = (MSFT*MSFT_weight + GE*GE_weight + JPM*JPM_weight)
    
  )
#transforming the dataset from wide to long format again and plotting the results
PTFOLIO %>% 
  pivot_longer(cols=c(2:5),names_to="Asset",values_to="TR_RET") %>%
  
  #plotting the returns
  ggplot(aes(x=DATE,y=TR_RET,color=Asset))+
  geom_line()+
  theme_fivethirtyeight() + theme(axis.title=element_text())+
  labs(title="Portfolio Returns Against Stock Returns", subtitle="The Evolution of a $1 Investment in a Weighted Portfolio of MSFT (0.4), GE (0.3) and JPM (0.3) \nAgainst the Underlying Stocks",x="Date",y="Total Return")+
  scale_x_date(date_breaks="2 years",date_labels="%Y")+
  scale_y_continuous(labels=scales::dollar_format(), breaks=seq(0,120,by=10), limits=c(0,120))+
  theme(plot.subtitle = element_markdown(), legend.position="top")

This plot implies that the portfolio composed of a weighted sum of the three stocks has a higher risk-to-reward (Sharpe) ratio, relative to each of the individual stocks.

Indeed, while both GE and JPM have relatively low risk (low variance/volatility) and low reward (with a total return of less than 20x over the period), MSFT has extremely high reward (110x return), at high risk. Contrastingly, the collective portfolio demonstrates considerably lower variance than MSFT stock, while generating approximately half the returns.

This means that investing in the weighted portfolio is more favourable for risk-averse, return-maximising investors than any of the individual stocks which compose it.

QUESTION 15

  1. Portfolio theory with matrix algebra

Note: due to the defined sample period of 02/01/1990 - 31/12/2002, this period is utilised throughout Q15.

15.1 Calculate the means, the variance and the pairwise covariances for the three stocks MSFT, GE, and JPM for the sample period between 2/1/1990 and 31/12/2002.

For clarity, we return to the core HPR_daily dataset and do not use any previously defined dataframes.

#creating a new dataframe from HPR_daily which contains only the selected stocks, filtered for the stated date range
stocks_HPR <- HPR_daily %>% 
  select (DATE, MSFT, GE, JPM) %>%
  filter (DATE >= as.Date("1990-01-02") & DATE <= as.Date("2012-12-31"))

#creating the requested summary data frame
stock_summary <- stocks_HPR %>% 
  
  #removing the date column
  select(-DATE) %>% 
  
  #pivoting the data into long format
  pivot_longer(1:3, names_to="TICKER", values_to="RET") %>% 
  
  #grouping the returns data by TICKER
  group_by(TICKER) %>% 
  
  #calculating summary statistics
  summarise (mean_return = round(mean(RET),6), variance_return = round(var(RET),6))
## `summarise()` ungrouping output (override with `.groups` argument)
#printing the stock summary
stock_summary
## # A tibble: 3 x 3
##   TICKER mean_return variance_return
##   <chr>        <dbl>           <dbl>
## 1 GE        0.000514        0.000343
## 2 JPM       0.000725        0.000674
## 3 MSFT      0.000927        0.000453
#obtaining all the pairwise covariances efficiently, via a covariance matrix
covariance_matrix <- round(var(
  
  stocks_HPR %>% 
    select(-DATE)
  
  ),6)

covariance_matrix
##          MSFT       GE      JPM
## MSFT 0.000453 0.000166 0.000216
## GE   0.000166 0.000343 0.000272
## JPM  0.000216 0.000272 0.000674
#printing each of the pairwise covariances

paste0("The pairwise covariance between MSFT and GE is ", covariance_matrix[1,2])
## [1] "The pairwise covariance between MSFT and GE is 0.000166"
paste0("The pairwise covariance between MSFT and JPM is ", covariance_matrix[1,3])
## [1] "The pairwise covariance between MSFT and JPM is 0.000216"
paste0("The pairwise covariance between GE and JPM is ", covariance_matrix[2,3])
## [1] "The pairwise covariance between GE and JPM is 0.000272"

15.2 Define the following matrices that contain returns, expected returns, portfolio weights, and covariances:

Returns/Expected Returns Matrix

We are asked to generate a matrix containing returns and a matrix containing expected returns, but these are the same matrices (containing the mean historical returns of the stocks). Therefore, I generate a single matrix for expected returns (mu).

mu <- matrix(
  
  c(
    #specifying the 3 means to be taken from the date-filtered, wide-format stocks_HPR data frame
    mean(stocks_HPR$MSFT), 
    mean(stocks_HPR$GE), 
    mean(stocks_HPR$JPM)
    
    ), 
  
  #specifying the number of columns and number of rows in the matrix object
  ncol=1, nrow=3)

#printing the returns/expected returns matrix, mu
mu
##              [,1]
## [1,] 0.0009272186
## [2,] 0.0005143367
## [3,] 0.0007253888

(Equal) Portfolio Weights Matrix

#since this portfolio is the equal weights portfolio, I call this weights matrix 'e'
e <- matrix(
  
  c(
    
    #MSFT weight
    1/3, 
    #GE weight
    1/3,
    #JPM weight
    1/3), 
  
  ncol=1, nrow=3)

Covariance Matrix

#since we already generated the covariance matrix in Q15.a, I simply rename this matrix to 'sigma' accordingly
sigma <- covariance_matrix

15.3 Calculate the return and standard deviation of a portfolio where you equal-weight the three stocks - call the portfolio e. Additionally, consider a portfolio y with a weight vector y’ = (0.8, 0.4, −0.2). Calculate the risk-return tradeoff of y as well as its covariance with portfolio e.

Equal Weight Portfolio

Return of an Equal Weight Portfolio The formula provided tells us that the expected return of a given portfolio is obtained by the dot product of: (i) the transposed portfolio weights matrix (x transposed) and (ii) mu, the expected returns matrix. In this case, the portfolio is the equal weights portfolio, meaning that the portfolio weights matrix is represented by ‘e’.

return_equal_weight <- 
  
  #this simple formulation finds the dot product (%*%) of the transposed e (equal portfolio weights) matrix with the expected returns matrix, mu
  t(e) %*% mu

return_equal_weight
##              [,1]
## [1,] 0.0007223147

Risk of an Equal Weight Portfolio The formula provided tells us that the expected risk of a given portfolio (the historical standard deviation) is given by the square root of the variance of the portfolio, which is obtained by the dot product of: (i) the transposed portfolio weights matrix (x transposed) and (ii) the dot product of (ii.i) sigma, the covariance matrix, and (ii.ii) x, the portfolio weights matrix (x). In this case, the portfolio is the equal weights portfolio, meaning that the portfolio weights matrix is represented by ‘e’.

risk_equal_weight <- 
  
  #by applying the sqrt function, I obtain the standard deviation (risk) from the variance
  sqrt(
    
    #this formulation finds the variance: the dot product (%*%) of the transposed portfolio weights matrix with the dot product of the covariance matrix and the portfolio weights matrix
    t(e) %*% (sigma %*% e)
    
    )

risk_equal_weight
##            [,1]
## [1,] 0.01756891

Unequal Weight Portfolio

#since this portfolio is the un-equal weights portfolio, I call this weights matrix 'y'
y <- matrix(
  
  c(
    
    #MSFT weight
    0.8, 
    #GE weight
    0.4,
    #JPM weight
    -0.2), 
  
  ncol=1, nrow=3)

Return of an Un-Equal Weight Portfolio I apply the same formula as above. However, in this case, the portfolio is the un-equal weights portfolio, meaning that the portfolio weights matrix is represented by ‘y’.

return_unequal_weight <- 
  
  #this simple formulation finds the dot product (%*%) of the transposed e (equal portfolio weights) matrix with the expected returns matrix, mu
  t(y) %*% mu

return_unequal_weight
##              [,1]
## [1,] 0.0008024318

Risk of an Un-Equal Weight Portfolio I apply the same formula as above. However, in this case, the portfolio is the un-equal weights portfolio, meaning that the portfolio weights matrix is represented by ‘y’.

risk_unequal_weight <- 
  
  #by applying the sqrt function, I obtain the standard deviation (risk) from the variance
  sqrt(
    
    #this formulation finds the variance: the dot product (%*%) of the transposed portfolio weights matrix with the dot product of the covariance matrix and the portfolio weights matrix
    t(y) %*% (sigma %*% y)
    
    )

risk_unequal_weight
##            [,1]
## [1,] 0.01911439

Portfolio Risk-Return Trade-Offs and Covariance

The Risk-Return Trade-Off of the Portfolios The risk-return trade-off of an asset or portfolio is summarised by the Sharpe Ratio: \[Sharpe \ Ratio = \frac{Mean \ Return}{Standard \ Deviation}\]. Therefore, I calculate the Sharpe Ratio for each portfolio (the first, defined by equal weights, e - and the second, defined by unequal weights, y).

#calculating the sharpe ratio for portfolio e
sharpe_equal_weight <- return_equal_weight/risk_equal_weight

#calculating the sharpe ratio for portfolio y
sharpe_unequal_weight <- return_unequal_weight/risk_unequal_weight

#printing the risk-return trade-offs
paste0("The risk-return trade-off (Sharpe Ratio) of the equal weights portfolio 'e' is ", sharpe_equal_weight)
## [1] "The risk-return trade-off (Sharpe Ratio) of the equal weights portfolio 'e' is 0.0411132291844826"
paste0("The risk-return trade-off (Sharpe Ratio) of the unequal weights portfolio 'y' is ", sharpe_unequal_weight)
## [1] "The risk-return trade-off (Sharpe Ratio) of the unequal weights portfolio 'y' is 0.0419805011204686"

The Covariance Between the Two Portfolios

The formula provided indicates that the covariance of the portfolio is obtained by the dot product of: (i) the first portfolio weights matrix (e or y) and (ii) the dot product of (ii.i) sigma, the covariance matrix, and (ii.ii) the second portfolio weights matrix (e or y).

#calculating the covariance between the two portfolios (an equal weights and unequal weights portfolio consisting of the same underlying assets: MSFT, GE, JPM)
PTFOLIO_covariance <- t(e) %*% (sigma %*% y)

#printing the covariance between the two portfolios
paste0("The covariance between the two portfolios is ", PTFOLIO_covariance)
## [1] "The covariance between the two portfolios is 0.000249333333333333"

15.4 In order to find the global minimum variance portfolio with weights m’ = (mMSF T , mGE, mJPM ), we have to minimise variance subject to the portfolio weights summing to 1. Calculate the variance and the expected return of the minimum variance portfolio.

Finding the Minimum Variance Portfolio (Weights)

The question provides us with the solution for the global minimum variance portfolio: \[A_mz_m = b\] with the solution \[z_m = A^{-1}_mb\], where \[A = \begin{pmatrix} 2\sigma^{2}_{MSFT} & 2\sigma_{MSFT,GE} & 2\sigma_{MSFT,JPM} & 1 \\ 2\sigma_{GE,MSFT} & 2\sigma^{2}_{GE} & 2\sigma_{GE,JPM} & 1 \\ 2\sigma_{JPM,MSFT} & 2\sigma_{JPM,GE} & 2\sigma^{2}_{JPM} & 1 \\ 1 & 1 & 1 & 0 \end{pmatrix}\]

and \[b = \begin{pmatrix} 0 \\ 0 \\ 0 \\ 1 \end{pmatrix}\]

Defining Matrix A Thus, we focus on defining \(A\), such that we can obtain the dot product of the inverse of this matrix with matrix \(b\) in order to obtain the solution \(z\).

#multiplying the sigma (covariance matrix) by 2 as the equation shows
A <- 2*sigma

#creating the 4th column
A <- cbind(A, c(1,1,1))

#creating the 4th row
A <- rbind(A, c(1,1,1,0))

#printing the matrix to ensure it is correct (which we find it is)
A
##          MSFT       GE      JPM  
## MSFT 0.000906 0.000332 0.000432 1
## GE   0.000332 0.000686 0.000544 1
## JPM  0.000432 0.000544 0.001348 1
##      1.000000 1.000000 1.000000 0

Defining Matrix b We now define \(b\), which is the final matrix we require to solve for \(z\).

#b, a single column matrix, is defined much more simply, using the matrix function

b <- matrix(
  
  c(
    
    0,
    0,
    0,
    1
    
    ), 
  
  ncol=1)

#printing the matrix to ensure it is correct (which we find it is)
b
##      [,1]
## [1,]    0
## [2,]    0
## [3,]    0
## [4,]    1

Solving for Matrix z

#we find that the function 'solve()' generates the inverse of a matrix

#thus, we calculate z by taking dot product of the inverse of matrix A with matrix b
z <- solve(A) %*% b

#we check that z has the expected form (portfolio weights for each underlying asset, and an unlabelled lambda term), which it does
z
##               [,1]
## MSFT  0.3667901831
## GE    0.5769341555
## JPM   0.0562756614
##      -0.0005481651

As stated in the assignment, the first three elements of \(z\) are the portfolio weights \(m\) for the global minimum variance portfolio:

\[z = \begin{pmatrix} m_{MSFT} \\ m_{GE} \\ m_{JPM} \\ \lambda \end{pmatrix}\]

I therefore extract and transpose these weights, m, from the \(z\) matrix, discarding the \(\lambda\) term.

#I extract the first three rows, and take only the first (names) column
m <- z[1:3,1]

#this value now contains the portfolio weights for the minimum variance portfolio
m
##       MSFT         GE        JPM 
## 0.36679018 0.57693416 0.05627566

Calculating the Variance and Expected Return of the Minimum Variance Portfolio

Structuring m as a Portfolio Weights Matrix

m <- matrix(m, ncol=1)

I repeat a process similar to that in Q15.3 to calculate the variance and expected return of the minimum variance portfolio.

Expected Return of the Minimum Variance Portfolio The formula provided tells us that the expected return of a given portfolio is obtained by the dot product of: (i) the transposed portfolio weights matrix (x transposed) and (ii) mu, the expected returns matrix. In this case, the portfolio is the minimum variance portfolio, meaning that the portfolio weights matrix is represented by ‘m’.

return_minimum_variance <- 
  
  #this simple formulation finds the dot product (%*%) of the transposed m (minimum variance portfolio weights) matrix with the expected returns matrix, mu
  t(m) %*% mu

#printing the expected return
paste0("The expected return of the minimum variance portfolio is ", return_minimum_variance)
## [1] "The expected return of the minimum variance portfolio is 0.000677654826207544"

Variance of the Minimum Variance Portfolio The variance of the portfolio is obtained by the dot product of: (i) the transposed portfolio weights matrix (x transposed) and (ii) the dot product of (ii.i) sigma, the covariance matrix, and (ii.ii) x, the portfolio weights matrix (x). In this case, the portfolio is the minimum variance portfolio, meaning that the portfolio weights matrix is represented by ‘m’.

variance_minimum_variance <-
  
  (
    #this formulation finds the variance: the dot product (%*%) of the transposed portfolio weights matrix with the dot product of the covariance matrix and the portfolio weights matrix
    t(m) %*% (sigma %*% m)
    
    )

#printing the variance
paste0("The variance of the minimum variance portfolio is ", variance_minimum_variance)
## [1] "The variance of the minimum variance portfolio is 0.000274082565627273"

15.5 Find another efficient portfolio. Namely, the efficient portfolio that gives a return equal to the expected return of MSFT. Note, that your minimization problem now becomes: minimise variance subject to the portfolio weights summing to 1 and expected return is equal to the expected return of MSFT. Derive the solution as above in terms of portfolio weights and calculate them in your code. In addition, calculate the expected return and the variance of efficient portfolio x as well as its covariance with the global minimum portfolio.

We adopt the same approach as before, with a slightly different A and b matrices which includes the second condition via an additional row (and, for A, column) containing the expected (mean) returns for each stock. Thus,

\[A_mz_m = b\] with the solution \[z_m = A^{-1}_mb\], where \[A = \begin{pmatrix} 2\sigma^{2}_{MSFT} & 2\sigma_{MSFT,GE} & 2\sigma_{MSFT,JPM} & \mu_{MSFT} & 1 \\ 2\sigma_{GE,MSFT} & 2\sigma^{2}_{GE} & 2\sigma_{GE,JPM} & \mu_{MSFT}& 1 \\ 2\sigma_{JPM,MSFT} & 2\sigma_{JPM,GE} & 2\sigma^{2}_{JPM} & \mu_{MSFT} & 1 \\ \mu_{MSFT} & \mu_{MSFT} & \mu_{MSFT} & 0 & 0 \\ 1 & 1 & 1 & 0 & 0 \end{pmatrix}\]

and \[b = \begin{pmatrix} 0 \\ 0 \\ 0 \\ \mu_{MSFT}\\ 1 \end{pmatrix}\]

Finding the Efficient Portfolio

Defining Matrix A

#multiplying the sigma (covariance matrix) by 2 as the previous equation shows
A_new <- 2*sigma

#adding the two conditions in the column direction
A_new <- cbind(A_new, mu, c(1,1,1))

#adding the new (mu) condition in the row direction
A_new <- rbind(A_new, c(t(mu), 0, 0))

#adding the first condition in the row direction
A_new <- rbind(A_new, c(1,1,1,0,0))

#printing the matrix to ensure it is correct (which we find it is)
A_new
##              MSFT           GE          JPM               
## MSFT 0.0009060000 0.0003320000 0.0004320000 0.0009272186 1
## GE   0.0003320000 0.0006860000 0.0005440000 0.0005143367 1
## JPM  0.0004320000 0.0005440000 0.0013480000 0.0007253888 1
##      0.0009272186 0.0005143367 0.0007253888 0.0000000000 0
##      1.0000000000 1.0000000000 1.0000000000 0.0000000000 0

Defining Matrix b

#b, a single column matrix, is defined much more simply, using the matrix function

b_new <- matrix(
  
  c(
    
    0,
    0,
    0,
    
    #since only the expected return of MSFT is required, I use mu[1] representing the first row
    mu[1],
    
    1
    
    ), 
  
  ncol=1)

#printing the matrix to ensure it is correct (which we find it is)
b_new
##              [,1]
## [1,] 0.0000000000
## [2,] 0.0000000000
## [3,] 0.0000000000
## [4,] 0.0009272186
## [5,] 1.0000000000

Solving for Matrix z

#again we use the function 'solve()' to generate the inverse of the matrix

#thus, we calculate z by taking dot product of the inverse of the new matrix A (including the new expected returns condition) with the new matrix b (including the expected returns condition)
z_new <- solve(A_new) %*% b_new

#we check that z has the expected form (portfolio weights for each underlying asset, and an unlabelled lambda term), which it does
z_new
##               [,1]
## MSFT  0.8949441710
## GE   -0.1004651980
## JPM   0.2055210269
##      -1.2745638923
##       0.0003155492
#I extract the first three rows, and take only the first (names) column
x <- z_new[1:3,1]

#this value now contains the portfolio weights for the minimum variance portfolio
x
##       MSFT         GE        JPM 
##  0.8949442 -0.1004652  0.2055210

Calculating the Variance and Expected Return of the Efficient Portfolio

Structuring m as a Portfolio Weights Matrix

x <- matrix(x, ncol=1)

Expected Return of the Efficient Portfolio

return_efficient <- 
  
  #this simple formulation finds the dot product (%*%) of the transposed m (efficient portfolio weights) matrix with the expected returns matrix, mu
  t(x) %*% mu

#printing the expected return
paste0("The expected return of the efficient portfolio is ", return_efficient)
## [1] "The expected return of the efficient portfolio is 0.000927218561324823"

Variance of the Efficient Portfolio

variance_efficient <-
  
  (
    #this formulation finds the variance: the dot product (%*%) of the transposed portfolio weights matrix with the dot product of the covariance matrix and the portfolio weights matrix
    t(x) %*% (sigma %*% x)
    
    )

#printing the variance
paste0("The variance of the efficient portfolio is ", variance_efficient)
## [1] "The variance of the efficient portfolio is 0.000433125028434414"

The Covariance Between the Minimum Variance Portfolio and the Efficient Portfolio

The formula provided indicates that the covariance of the portfolio is obtained by the dot product of: (i) the first portfolio weights matrix (m or x) and (ii) the dot product of (ii.i) sigma, the covariance matrix, and (ii.ii) the second portfolio weights matrix (m or x).

#calculating the covariance between the two portfolios (an equal weights and unequal weights portfolio consisting of the same underlying assets: MSFT, GE, JPM)
PTFOLIO_new_covariance <- t(m) %*% (sigma %*% x)

#printing the covariance between the two portfolios
paste0("The covariance between the efficient portfolio and the global minimum variance portfolio is ", PTFOLIO_new_covariance)
## [1] "The covariance between the efficient portfolio and the global minimum variance portfolio is 0.000274082565627273"

15.6 Plot the entire efficient frontier for the three risky assets.

In order to plot the efficient frontier, we have to obtain the volatility (standard deviation) and expected return, given a certain inputted:

  1. return (the equivalent of the second condition in the efficient portfolio, which required returns to equal MSFT returns)
  2. matrix of expected returns (mu)
  3. covariance matrix (sigma)

To achieve this, we define a function capable of repeating the steps taken in 15.6 - taking the “return” e.g. MSFT expected return as and input and outputting both the volatility and expected return of the corresponding efficient portfolio.

frontier_calculator <- function(return, mu, sigma){
  
  #we define matrix A
  A <- 2*sigma
  A <- cbind(A, mu, c(1,1,1))
  A <- rbind(A, c(t(mu), 0, 0))
  A <- rbind(A, c(1,1,1,0,0))
  
  #we define matrix b
  b = matrix(c(0, 0, 0, return, 1), ncol=1)
  
  #we solve for z
  z = solve(A) %*% b
  
  #we extract the portfolio weights
  x = z[1:3,1]
  
  #we return the expected return and volatility (standard deviation) of the portfolio
  return(
    
    c(
      
      #expected return
      t(x) %*% mu, 
      
      #standard deviation (volatility)
      sqrt(t(x) %*% sigma %*% x)
      
      )
  
    )
  
}

I now apply the function by specifying a range of return values to use as inputs:

#setting the range and step of inputted 'returns' values
returns  <- seq(
  
  from = 0, 
  to = 0.003, 
  by=0.00005)

#creating empty vectors to be populated with expected return and volatility values
volatility = rep(NA, length(returns))
expected_return = rep(NA, length(returns))

#applying the 'frontier_calculator' function by looping it over all the values in the 'returns' list to obtain the volatilities and expected returns required for the efficient frontier plot

for (i in 1:length(returns)){
  
  #we run the function using our previously calculated mu and sigma matrices (see Q15.2), and the 'returns' inputs
  calculations <- frontier_calculator(returns[i], mu, sigma)
  
  #obtaining the first column of the resulting dataframe as the expected returns
  expected_return[i] <- calculations[1]
  
  #obtaining the second column of the resulting dataframe as the volatility
  volatility[i] <- calculations[2]
  
}

frontier_output <- cbind(expected_return,volatility)
# create and store the plot:
frontier1 <- ggplot(data=NULL, aes(y=expected_return, x=volatility)) + 
  geom_point(color="red",size=0.5) +
  theme_fivethirtyeight() + theme(axis.title=element_text())+
  labs(
    title = "The Efficient Frontier", subtitle="From 02/01/1990 to 31/12/2002",
    x = "Volatility (Standard Deviation)",
    y = "Expected Return"
  ) 

# display the plot:
frontier1

15.7 Now, rerun your code with sample moments for the three stocks MSFT, GE, and JPM for the sample period between 2/1/2003 and 31/12/2014.

Specifying the New Date Range Data

#creating a new dataframe from HPR_daily which contains only the selected stocks, filtered for the stated date range
stocks_HPR2 <- HPR_daily %>% 
  select (DATE, MSFT, GE, JPM) %>%
  filter (DATE >= as.Date("2003-01-02") & DATE <= as.Date("2014-12-31"))

Defining the Returns/Expected Returns Matrix

mu2 <- matrix(
  
  c(
    #specifying the 3 means to be taken from the date-filtered, wide-format stocks_HPR data frame
    mean(stocks_HPR2$MSFT), 
    mean(stocks_HPR2$GE), 
    mean(stocks_HPR2$JPM)
    
    ), 
  
  #specifying the number of columns and number of rows in the matrix object
  ncol=1, nrow=3)

#printing the returns/expected returns matrix, mu
mu2
##              [,1]
## [1,] 0.0004435250
## [2,] 0.0003160245
## [3,] 0.0007604899

Defining the Covariance Matrix

sigma2 <- round(var(
  
  stocks_HPR2 %>% 
    select(-DATE)
  
  ),6)

Obtaining the Global Minimum Variance Portfolio

A <- 2*sigma2
A <- cbind(A, c(1,1,1))
A <- rbind(A, c(1,1,1,0))

A
##          MSFT       GE      JPM  
## MSFT 0.000562 0.000290 0.000404 1
## GE   0.000290 0.000686 0.000620 1
## JPM  0.000404 0.000620 0.001336 1
##      1.000000 1.000000 1.000000 0
b <- matrix(c(0,0,0,1), ncol=1)
b
##      [,1]
## [1,]    0
## [2,]    0
## [3,]    0
## [4,]    1
z <- solve(A) %*% b
z
##               [,1]
## MSFT  0.6077685438
## GE    0.4477280520
## JPM  -0.0554965958
##      -0.0004489864
m <- z[1:3,1]
m
##       MSFT         GE        JPM 
##  0.6077685  0.4477281 -0.0554966
m <- matrix(m, ncol=1)
return_minimum_variance <- 
  
  t(m) %*% mu2

paste0("The expected return of the minimum variance portfolio is ", return_minimum_variance)
## [1] "The expected return of the minimum variance portfolio is 0.000368848969184188"
variance_minimum_variance <-
  
  (
     t(m) %*% (sigma2 %*% m)
    
    )

paste0("The variance of the minimum variance portfolio is ", variance_minimum_variance)
## [1] "The variance of the minimum variance portfolio is 0.000224493215994253"

Obtaining the Efficient Portfolio in Relation to MSFT

A_new <- 2*sigma2
A_new <- cbind(A_new, mu2, c(1,1,1))
A_new <- rbind(A_new, c(t(mu2), 0, 0))
A_new <- rbind(A_new, c(1,1,1,0,0))
A_new
##             MSFT           GE          JPM               
## MSFT 0.000562000 0.0002900000 0.0004040000 0.0004435250 1
## GE   0.000290000 0.0006860000 0.0006200000 0.0003160245 1
## JPM  0.000404000 0.0006200000 0.0013360000 0.0007604899 1
##      0.000443525 0.0003160245 0.0007604899 0.0000000000 0
##      1.000000000 1.0000000000 1.0000000000 0.0000000000 0
b_new <- matrix(c(0,0,0,mu2[1], 1), ncol=1)
b_new
##             [,1]
## [1,] 0.000000000
## [2,] 0.000000000
## [3,] 0.000000000
## [4,] 0.000443525
## [5,] 1.000000000
z_new <- solve(A_new) %*% b_new
z_new
##               [,1]
## MSFT  0.6196071599
## GE    0.2712723663
## JPM   0.1091204738
##      -0.2944244854
##      -0.0003403883
x <- z_new[1:3,1]
x
##      MSFT        GE       JPM 
## 0.6196072 0.2712724 0.1091205
x <- matrix(x, ncol=1)
return_efficient <- 
  
  t(x) %*% mu2

paste0("The expected return of the efficient portfolio is ", return_efficient)
## [1] "The expected return of the efficient portfolio is 0.000443524991724595"
variance_efficient <-
  
  (
        t(x) %*% (sigma2 %*% x)
    
    )

paste0("The variance of the efficient portfolio is ", variance_efficient)
## [1] "The variance of the efficient portfolio is 0.000235486440749376"
PTFOLIO_new_covariance <- t(m) %*% (sigma2 %*% x)

paste0("The covariance between the efficient portfolio and the global minimum variance portfolio is ", PTFOLIO_new_covariance)
## [1] "The covariance between the efficient portfolio and the global minimum variance portfolio is 0.000224493215994253"

Calculating and Plotting the Efficient Frontier

frontier_calculator2 <- function(return, mu2, sigma2){
  
  #we define matrix A
  A <- 2*sigma2
  A <- cbind(A, mu2, c(1,1,1))
  A <- rbind(A, c(t(mu2), 0, 0))
  A <- rbind(A, c(1,1,1,0,0))
  
  #we define matrix b
  b = matrix(c(0, 0, 0, return, 1), ncol=1)
  
  #we solve for z
  z = solve(A) %*% b
  
  #we extract the portfolio weights
  x = z[1:3,1]
  
  #we return the expected return and volatility (standard deviation) of the portfolio
  return(
    
    c(
      
      #expected return
      t(x) %*% mu2, 
      
      #standard deviation (volatility)
      sqrt(t(x) %*% sigma2 %*% x)
      
      )
  
    )
  
}
#setting the range and step of inputted 'returns' values
returns2  <- seq(
  
  from = 0, 
  to = 0.003, 
  by=0.00005)

#creating empty vectors to be populated with expected return and volatility values
volatility2 = rep(NA, length(returns2))
expected_return2 = rep(NA, length(returns2))

#applying the 'frontier_calculator' function by looping it over all the values in the 'returns' list to obtain the volatilities and expected returns required for the efficient frontier plot

for (i in 1:length(returns2)){
  
  #we run the function using our previously calculated mu and sigma matrices (see Q15.2), and the 'returns' inputs
  calculations2 <- frontier_calculator2(returns2[i], mu2, sigma2)
  
  #obtaining the first column of the resulting dataframe as the expected returns
  expected_return2[i] <- calculations2[1]
  
  #obtaining the second column of the resulting dataframe as the volatility
  volatility2[i] <- calculations2[2]
  
}

frontier_output2 <- cbind(expected_return2,volatility2)
# create and store the plot:
frontier2 <- ggplot(data=NULL, aes(y=expected_return2, x=volatility2)) + 
  geom_point(color="red",size=0.5) +
  theme_fivethirtyeight() + theme(axis.title=element_text())+
  labs(
    title = "The Efficient Frontier", subtitle="From 02/01/2003 to 31/12/2014",
    x = "Volatility (Standard Deviation)",
    y = "Expected Return"
  ) 

# display the plot:
frontier2

15.8 Finally, compare your results for the three assets across the two sample periods. Comment on potential problems that might arise when you based investment decisions on your analysis. Also discuss potential solutions to the problems mentioned.

Comparing the mean and covariance returns for MSFT, GE, and JPM across the two sample periods

Mean Returns

In terms of returns, we find that the while the average return for both MSFT and GE is greater in the earlier (2/1/1990 - 31/12/2002) than in the later (2/1/2003 and 31/12/2014) period, for JPM the average return is almost identical across the two periods - and only marginally higher in the second. For the mean return, MSFT ranked the highest (at 0.00093) in the first sample period while JPM reported the highest returns (at 0.00076), in the second. Though both MSFT and GE reported lower average returns during the second period (with MSFT experiencing the most substantial decline in returns), JPM actually reported a minimal increase of 0.00004 in its returns.

Covariances

In terms of volatility, the covariance estimates also differ considerably between these two periods, with the covariance between MSFT returns and both GE and JPM returns decreasing moderately, and the covariance between GE and JPM increasing substantially, from the first to the second period.

Efficient Frontiers

The rightward shift in the efficient frontier from the first to the second sample period indicates that the risk-return relationship (the Sharpe ratio) has worsened over time. Indeed, in the latter 2003-2014 period, an investor in the portfolio of these risky assets must take on more risk (greater returns volatility) for the same expected return, than an investor in the former 1990-2002 period. Equivalently, a similarly risk-averse investor in the second period would have to settle for a lower expected return than an investor holding the portfolio in the first period.

These notable differences, in (mean) expected return and the covariance of the returns of the 3 assets in the portfolio, between the periods mean that while we would expect roughly equivalent efficient frontiers between the two periods (due to random variation cancelling over full-period means), we instead find a considerably lower Sharpe Ratio in the latter period. Such a substantial difference in the efficient frontier (and thus the global minimum variance portfolio) implies that we experience a relatively high level of estimation (measurement) error in our calculations.

frontier1+frontier2

Potential Problems with Basing Investment Decisions on Our Analyses

The kind of estimation error we observe is to be expected, since financial markets rarely fit the assumptions of mean-variance models and the efficient frontier is dynamic, so period-specific mean statistics are suboptimal for calculating and plotting the true efficient frontier at any given time (and therefore informing present-day, real-time investment decisions)

Further, historical measurements of returns and volatility are used, which are subject to unknown changes at the time of calculation, and even if this model were applied, investors may not have free access to reliable data on returns and volatility to make sensible judgments.

Potential Solutions

To investigate and address the problem of observed estimation error, we can perform robustness checks by re-calculating the efficient frontier for: a greater number of sub-periods and sub-periods of different lengths. It is also likely that, because we are evaluating a frontier of only 3 stocks, including more assets in the portfolio would likely “smooth” out observed aggregate differences between periods, allowing more granular estimation of the efficient frontier and its period-by-period evolution.

More practically, we can employ investment decision making processes based on more realistic real-time information: by developing allocations based on market conditions and incorporating the use of strategic diversification, we can manage risk while dynamically optimizing portfolio allocation on the basis of multiple factors in addition to simple risk and return. In particular, further factors to consider include the transaction and information costs faced by investors in real-time markets.

–END–